HTM: A Topic Model for Hypertexts

نویسندگان

  • Congkai Sun
  • Bin Gao
  • Zhenfu Cao
  • Hang Li
چکیده

Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM). HTM defines the distribution of words in a document (i.e., the content of the document) as a mixture over latent topics in the document itself and latent topics in the documents which the document cites. The topics are further characterized as distributions of words, as in the conventional topic models. This paper further proposes a method for learning the HTM model. Experimental results show that HTM outperforms the baselines on topic discovery and document classification in three datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of the Effect of Band Offset and Mobility of Organic/Inorganic HTM Layers on the Performance of Perovskite Solar Cells

Abstract: Perovskite solar cells have become an attractive subject in the solar energydevice area. During ten years of development, the energy conversion efficiency has beenimproved from 2.2% to more than 22%, and it still has a very good potential for furtherenhancement. In this paper, a numerical model of the perovskite solar cell with thestructure of glass/ FTO/ TiO2/...

متن کامل

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades

Understanding the diffusion of information in social networks and social media requires modeling the text diffusion process. In this work, we develop the HawkesTopic model (HTM) for analyzing text-based cascades, such as “retweeting a post” or “publishing a follow-up blog post.” HTM combines Hawkes processes and topic modeling to simultaneously reason about the information diffusion pathways an...

متن کامل

Specification and Design of Workflow-Driven Hypertexts

In presents, web combines several applications and it’s seemed to be in all places. So, web applications are changing to meet new requirements such as management of multiple users and complex dataflow. Brambilla, Ceri et al. in there article “Specification and Design of workflow-driven hypertexts” (2002), introduce workflow driven hypertexts. Which are web-enabled hypertextual applications that...

متن کامل

Components of a Model of Context-Sensitive Hypertexts

On the background of rising Intranet applications the automatic generation of adaptable, context-sensitive hypertexts becomes more and more important [El-Beltagy et al., 2001]. This observation contradicts the literature on hypertext authoring, where Information Retrieval techniques prevail, which disregard any linguistic and context-theoretical underpinning. As a consequence, resulting hyperte...

متن کامل

The biologically inspired Hierarchical Temporal Memory

It is herein proposed a handwritten digit recognition system which biologically inspired of the large-scale structure of the mammalian neocortex. Hierarchical Temporal Memory (HTM) is a memory-prediction network model that takes advantage of the Bayesian belief propagation and revision techniques. In this article a study has been conducted to train a HTM network to recognize handwritten digits ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008